Automatic Detection of Stop Words for Texts in Uzbek Language
نویسندگان
چکیده
Stop words are very important for information retrieval and text analysis investigation. This study aimed to automatically analyzed detect stop in texts Uzbek language. Because of limited availability methods automatic search we a newly prepared corpus. language belongs the family agglutinative languages. As with all languages, can explain that detection is more complex process than inflected languages: In such as auxiliary words, articles, prepositions be included group. meanings hidden text. Therefore, it not appropriate apply known languages directly this work, “School corpus” which contains 731156 has been investigated. The bigram method was applied We proposed collocation detecting Uzbek. It shown 6 times better method.
منابع مشابه
A MODEL FOR EVOLUTIONARY DYNAMICS OF WORDS IN A LANGUAGE
Human language, over its evolutionary history, has emerged as one of the fundamental defining characteristic of the modern man. However, this milestone evolutionary process through natural selection has not left any ’linguistic fossils’ that may enable us to trace back the actual course of development of language and its establishment in human societies. Lacking analytical tools to fathom the cr...
متن کاملDetection of Loan Words in Uyghur Texts
For low-resource languages like Uyghur, data sparseness is always a serious problem in related information processing, especially in some tasks based on parallel texts. To enrich bilingual resources, we detect Chinese and Russian loan words from Uyghur texts according to phonetic similarities between a loan word and its corresponding donor language word. In this paper, we propose a novel approa...
متن کاملAutomatic Author Detection for Turkish Texts
To classify a text or to recognize its author there are two ways. To use the content of the text or the style. In this study 22 of style markers figured out for each author. By the developed method the author of a text can be determined using the style markers formed from a group of authors. The author group consists of 18 different authors and the success rate has been obtained as %84 in average.
متن کاملAutomatic Stochastic Tagging of Natural Language Texts
Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the eleven main grammatical classes and a large set of grammatical categories common to all languages. T...
متن کاملAutomatic Detection of Antisocial Behaviour in Texts
A considerable amount of effort has been made to reduce the physical manifestation of antisocial behaviour (ASB) in communities. However, the key to the early detection of ASB is, in many cases, in observing its manifestations in written language, which has not been studied in detail. In this work, we search for linguistic features that pertain to ASB in order to use those features for the auto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Informatica
سال: 2023
ISSN: ['0350-5596', '1854-3871']
DOI: https://doi.org/10.31449/inf.v47i2.3788